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^ ■ Abstract 
o ■ 

Qj ' We study one dimension in program evolution, namely the evolution of the datatype decla- 

pL( ■ rations in a program. To this end, a suite of basic transformation operators is designed. We 

! cover structure-preserving refactorings, but also structure-extending and -reducing adapta- 

^ I tions. Both the object programs that are subject to datatype transformations, and the meta 

programs that encode datatype transformations are functional programs. 

^ ; 1 Introduction 

We study operators for the transformation of the datatype declarations in a program. 
The presentation will be biased towards the algebraic datatypes in Haskell, but the 



O 



> 

OO . concepts are of relevance for many typed declarative languages, e.g.. Mercury and 



SML, as well as frameworks for algebraic specification or rewriting like ASF-i-SDF, 



■ CASL, Elan, and Maude. Our transformations are rather syntactical in nature as 

O , opposed to more semantical concepts such as data refinement. Our transformations 

. contribute to the more general notion of functional program refactoring [ JTR01| ]. 



^ ■ The following introductory example is about extracting a new datatype from con- 

structor components of an existing datatype. This is illustrated with datatypes that 
represent the syntax of an imperative language. The following extraction identifies 
\ a piece of syntax to enable its reuse in later syntax extensions: 

5^ , - - Datatypes with focus on two constructor components 

data Prog = Prog ProgName [Dec] [Stat] 
data Dec — VDec Id Type 

data Stat — Assign Id Expr | // Expr Stat Stat \ ... 

- - After extraction of [Dec] [Stat] to constitute a new datatype Block 
data Prog = Prog ProgName Block 
data Block = Block [Dec] [Stat] 

In the present paper, we describe the design of a framework for datatype transfor- 
mations including the operators for the above extraction. In Sec. |^, we identify 
all the concerns addressed by the framework. In Sec. ^, we describe all the basic 
operators for datatype transformations. In Sec. ^, these operators are lifted from 
datatypes to complete programs. Related work is discussed in Sec. ^. The paper is 
concluded in Sec. ^ 
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2 Concerns in datatype transformation 

The central contribution of the present paper is a simple, well-defined, and 'editing- 
complete' suite of operators for datatype transformations. Before we embark on 
this suite, we identify the concerns addressed by our approach: 

• Datatype transformations via scripting or interactive tool support. 

• Well-defined primitives for datatype transformations. 

• Generic meta-programming for conciseness of datatype transformations. 

• Flexible means of referring to fragments of interest in datatype transformations. 
We will now discuss these concerns in some depth. 

2. 1 Scripting vs. interactive tool support 

From the point of view of a programmer, datatype transformations should be founded 
on intuitive scenarios for adaptation. To actually perform (datatype) transforma- 
tions, there are two modes of operation. The first mode is scripting: the program- 
mer encodes the desired transformation as an expression over basic or higher-level 
operators. The second mode is interactive transformation based on a corresponding 
GUI. The benefits of an interactive tool are rather obvious. Such a tool is useful to 
issue a transformation on the basis of an operator- specific dialogue, and to provide 
a tailored list of options for transformations that make sense in a given context. A 
crucial benefit of interactive transformation is that the GUI can be used to provide 
feedback to the programmer: Which locations were changed? Where is the pro- 
grammer's attention needed to complete the issued transformation scenario? The 
apparent benefits of scripting such as the opportunities to revise transformations 
and to replay them can be also integrated into an interactive setting. 

In Fig. [1], we illustrate the interactive treatment of the introductory example using 
our prototypical tool TH — Transform //askell. As the snapshot indicates, we use 
a designated /oW dialogue to perform the extraction of the piece of syntax. (Fold- 
ing is the basic transformation underlying extraction.) This dialogue combines 
several transformation steps and side conditions in a convenient way. The figure 
shows the following situation. The user has selected two consecutive types " [ Dec ] 
[ St at ] " and initiated the fold dialogue. The user has also typed in "Block" in the 
"type name" field. The introduction check-box is marked automatically since the 
given type name does not yet exist. The user has also selected the "kind" radio- 
button to be "data" and filled in "Block" in the "cons name" field. After this, the 
user would press "Replace" to make the change. If there had been more than one 
occurrence, the user could replace them all with "Replace All", or step through all 
occurrences with "Next", and replace only specific ones with "Replace" as with 
ordinary J and replace in text editors. 
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module ToyPascal where 

data Prog = Prog ProgName 

data Dec = UDec Id Type 

data Stat = Assign Id Expr 

I If Expr Stat Stat 

data Expr = Var Id 

I Const Int 

type ProgName - String 

type Id = String 

type Type = String 
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Fig. 1 . A snapshot related to the interactive treatment of the introductory example 

Here is an open-ended list of further common transformation scenarios: 

• Renaming type and constructor names. 

• Permuting type arguments and constructor components. 

• The dual of extracting datatypes, i.e., inlining datatypes. 

• Including a constructor declaration together with associated functionality. 

• Excluding a constructor declaration together with associated functionality. 

• Inserting a constructor component together with associated functionality. 

• Deleting a constructor component together with associated functionality. 



2.2 Well-defined transformation primitives 

The core asset of our framework is a suite of basic operators, which can be either 
used as is, or they can be completed into more complex, compound transforma- 
tions. In the design of this suite, we reuse design experience from a related effort 
on grammar adaptation QLamUl| ]. Indeed, there is an obvious affinity of grammar 
transformations and datatype transformations. A challenging problem that we did 
not need to address in this previous work, is the completion of datatype transforma- 
tions to apply to entire (functional) programs in which evolving datatypes reside. 

We list the required properties of our basic transformation operators: 

Correctness Mostly, we insist on 'structure preservation', that is, the resulting 
datatype is of the same shape as the original datatype. This is enforced by the 
pre- and postconditions of the operators. 
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Completeness The operators are 'editing-complete', that is, they capture all sce- 
narios of datatype evolution that are otherwise performed by plain text editors. 
Semantics-preserving adaptations are defined in terms of disciplined primitives. 

Orthogonality The operators inhabit well-defined, non-overlapping roles. Higher- 
level scenarios for interactive transformation are derivable. Operators for datatype 
transformations are complementary to expression-level transformations. 

Locality The basic operators operate on small code locations as opposed to 'global' 
or 'exhaustive' operators, which iterate over the entire program. Note that some 
operators are necessarily exhaustive, e.g., an operator to rename a type name. 

Implementability The operators are implemented as syntactical transformations 
that are constrained by simple analyses to check for pre- and postconditions, but 
which otherwise do not necessitate any offline reasoning. 

Universality While the present paper focuses on datatype transformations, the 
principles that are embodied by our operators are universal in the sense that they 
also apply to other abstractions than datatypes, e.g., functions or modules. 

We do not list these properties to announce a formal treatment. This would be 
very challenging as we opt for the complex language setup of Haskell. The above 
properties provide merely a design rationale. A formal approach is an important 
subject for future work, but it does not contribute anything to the narrow goal of the 
present paper: to compile an inventory of the basic roles in datatype transformation. 

2.3 Generic meta-programming 

We implement transformation operators and compound meta-programs in Haskell. 
We reuse a publicly available abstract syntax for HaskelLQ We rely on generic 
programming techniques to perform meta-programming on the non-trivial Haskell 
syntax in Haskell. We use the Strafunski-styleQ of generic programming that 
allows us to complete functions on specific syntactical sorts into generic traver- 
sals that process subterms of the specific sorts accordingly. This style of meta- 
programming is known to be very concise because one only provides functionality 
for the types and constructors that are immediately relevant for the given problem. 

All our datatype transformations are of type Trafo which is defined as follows: 

type Trafo = HsModule Maybe HsModule 

That is, a datatype transformation is a partial function on HsModule — the abstract 
syntactical domain for Haskell modules. Partiality is expressed by means of the 
Maybe type constructor that wraps the result type. Partially is needed to model 
side conditions. 

In Fig. 0, we illustrate generic meta-programming by giving the definition of a 
simple operator for replacing type names. The specification formalises the fact that 

^ The used abstract syntax is part of the Haskell Core Libraries — in the haskell-src package. 

^ http : / / www . cs . vu . nl/ St raf unski/ 
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Replace a type name 

replaceTypeld :: Typeld Typeld — > Trafo 

replaceTypeld n n' = fulLtdTP {adhocTP {adhocTP idTP declSite) refSite) 
where 

Transform declaring occurrences of type names 

declSite :: HsDecl —* Maybe HsDecl 

declSite {HsTypeDecl I nO ps t) \ nO = n = return [HsTypeDecl I n' ps t) 

declSite (HsDataDecl I c nO ps cds d) \ nO = n = return (HsDataDecl I c n' ps cds d) 
declSite {HsNewTypeDecl I c nO ps cd d) \ nO = n = return [HsNewTypeDecl I c n' ps cd d) 
declSite decl = return decl 

Transform using occurrences of type names 

refSite :: HsType — > Maybe HsType 

refSite {HsTyCon {UnQual nO)) \ nO = n = return (HsTyCon {UnQual rt')) 
refSite tpe = return tpe 



Fig. 2. Specification of the replacement operation underlying renaming of type names 

type names can occur in two kinds of locations: either on a declaration site, when 
we declare the type, or on a using site, when we refer to the type in a type expres- 
sion. So we need to synthesise a transformation which pays special attention to the 
syntactical domains for declaring and using sites. Indeed, in the figure, there are 
two type-specific 'ad-hoc' cases which customise the identity function idTP. In the 
given context, we choose the traversal scheme full JdTP for 'full top-down traversal 
in Type-Preserving manner' . This way, we will reach each node in the input tree 
to transform type names on declaring and using sites. The operator replaceTypeld, 
by itself, is a total function. (So the Maybe in its type is not really needed here.) 
Partiality would be an issue if we derived an operator for renaming type names. 
This necessitates adding a side condition to insist on a fresh new name. 

2.4 Means of referring to fragments of interest 

Both the basic operators for datatype transformation but also actual transformation 
scenarios in scripts or in interactive sessions need to refer to program fragments of 
interest. Recall our introductory example. Extracting a type necessitates referring 
to the constructor components that are meant to constitute the new type. In our 
framework, we use three ways to refer to fragments of interest: 

Focus markers on subterms This approach is particularly suited for interactive 
transformations. Here, relevant fragments can be directly marked. In Fig. |^, 
we extend Haskell's abstract syntax to include term constructors for focusing on 
relevant fragments in datatype transformations. That is, we are prepared to focus 
on names of types, on type expressions, and on lists of constructor components. 

Selectors of subterms This approach is particularly suited for scripting transfor- 
mations. Selectors for Haskell's type expressions are defined in Fig. ^ The three 
forms of TypeSel represent the three kinds of declarations that involve types. The 
helper TypeSel' allows to select any part of a given type expression. 
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Focus on names 

data HsName = ... \ HsNameFocus HsName 
Focus on type expressions 

data HsType = ... \ HsTypeFocus HsType 

Focus on lists of constructor components 

data HsConDecl = HsConDecl SrcLoc HsName [ HsFocusedBangType ] 

I HsRecDecl SrcLoc HsName [{[HsName], HsBangType)] 
data HsFocusedBangType = HsUnfocusedBangType HsBangType 
I HsFocusedBangType [HsBangType] 



Fig. 3. Kinds of focus for datatype transformation 



data TypeSel 


= AliasRef Typeld TypeSel' — Refer to a type alias 




1 ConRef ConPos TypeSel' — Refer to a constructor component 




1 SigRef [FunId] TypeSel' - Refer to a function signature 


data TypeSel' 


= SelStop 


— Reference stops here 




1 SelDom TypeSel' 


- Refer to domain of function type 




SelCod TypeSel' 


- Refer to co-domain of function type 




1 Sellth ParaPos TypeSel' - Refer to products component 




1 SelFun TypeSel' 


— Refer to type constructor 




SelArg TypeSel' 


— Refer to type argument 


type Typeld 


= HsName 


- Refer to a type 


type ConId 


= HsName 


- Refer to a constructor 


type FunId 


= HsName 


- Refer to a function name 


type ConPos 


= ( ConId , ParaPos ) 


- Refer to a component of a constructor 


type ParaPos 


= Int 


- Refer to a parameter position 


data HsName 




- Syntactical sort for all kinds of names 



Fig. 4. Selectors that refer to type expressions, and others 



Predicates on subterms Such predicates typically constrain the type of a term or 
the top-level pattern. This approach is particularly suited for the repeated appli- 
cation of a transformation to different focuses that match a given predicate. 

There are ways to mediate between these different ways of referring to subterms. 
For example, given a term with a focus marker on a type expression, one can 
compute the selector that refers to the focused subterm. Given a predicate on type 
expressions, one can compute the list of all selectors so that an operator that is 
defined on selectors can be used with predicates as well. Finally, given a selector, 
one can also add the corresponding focus marker in the input at hand. 

3 Basic operators for datatype transformation 

We will now describe the themes that constitute our operator suite: 

• Renaming type and constructor names. 

• Permutation of type parameters and constructor components. 

• Swapping types on use sites. 

• Introduction vs. elimination of type declarations. 

• Folding vs. unfolding of type declarations. 
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Sample input datatype 




data ConsList a = 


Nil 1 Cons a {ConsList a) 


Renamed and permuted datatype 




data SnocList a = 


Lin 1 Snoc {SnocList a) a 



Fig. 5. Illustration of renaming and permutation 



rename Typeld 


: Typeld - 


-+ Typeld — 


Trafo 


— Rename a type declaration 


rename Conid 


■.ConId — 


* ConId — > 


Trafo 


— Rename a constnictor 


permuteTypeld 


: Typeld - 


-> [ParaPos 


] Trafo 


— Permute type parameters 


permuteConId 


■.ConId — 


* [ParaPos 


-> Trafo 


— Permute constructor components 



Fig. 6. Operators for renaming and parameter permutation 



renameTypeld {Hsident 


"ConsList") {Hsident "SnocList") 


^seqTrafo^ 


renameConId {Hsident 


Nil") {Hsident "Lin") 


^seqTrafo^ 


renameConId {Hsident 


'Cons") {Hsident "Snoc") 


^seqTrafo^ 


permuteConId {Hsident 


"Snoc") [2, 1] 





Fig. 7. Script for the scenario in Fig. |5| 



• Wrapping vs. unwrapping of constructor components. 

• Inclusion vs. exclusion of entire constructor declarations. 

• Insertion vs. deletion of constructor components. 

As this list makes clear, we group an operator with its inverse such as in "folding 
vs. unfolding", unless the operator can be used to inverse itself. This is the case for 
renaming, permutation, and swapping. The operators from the first six groups are 
(almost) structure-preserving. The last two groups deal with structure -extending 
and -reducing transformations. We will now explain the operators in detail in- 
cluding illustrative examples. We will only explain the effect of the operators on 
datatype declarations while we postpone lifting the operators to the level of com- 
plete programs until Sec. 0. 

3. 1 Renaming and permutation 

Let us start with the simplest datatype refactorings one can think of. These are 
transformations to consistently rename type or constructor names, and to permute 
parameters of type and constructor declarations. In Fig. ^, a simple example is 
illustrated. We rename the type name ConsList, the constructor names Nil and 
Cons, and we permute the two parameter positions of Cons. The resulting datatype 
specifies a SnocList as opposed to the ConsList before. 

In Fig. ^, we declare the operators for renaming names and permuting parameter 
lists. In Fig. ^ we include the script that encodes the ConsList-to- SnocList sample 
as a sequence of basic renaming and permuting transformations. To this end, we 
assume a sequential composition operator seqTrafo for datatype transformations. 
(In the script, seqTrafo is used as an infix operator 'seqTrafo'.) 
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data HsDecl = ... — Syntactical sort for (type) declarations 

introTypes :: [HsDecl]^ Trafo — Introduction of type declarations 
elimTypes :: [Typeld] ^ Trafo — Elimination of type declarations 



Fig. 8. Operators for introduction and elimination of datatypes 



type TypeHdr = (TypeId,[TypeVar\) 


— Header (LHS) of type declaration 


type TypeVar = HsName 


— Type variables 


foldAlias :: TypeSel TypeHdr — ► 


Trafo — Folding the referred type 


unfoldAlias :: TypeSel Trafo 


— Unfolding the referred type 



Fig. 9. Operators for folding and unfolding 
3.2 Introduction vs. elimination 



The next group of operators deals with the introduction and elimination of type 
declarations (see Fig. |]). Introduction means that the supplied types are added 
while their names must not be in use in the given program. Elimination means 
that the referenced types are removed while their names must not be referred to 
anymore in the resulting program. The two operators take lists of types as opposed 
to single ones because types can often only be introduced and eliminated in groups, 
say mutually recursive systems of datatypes. All kinds of type declarations make 
sense in this context: aliases, newtypes, and proper datatypes. The operators for 
introduction and elimination are often essential in compound transformations. This 
will be illustrated below when we reconstruct the introductory example in full detail 
(see Sec. 



3.3 Folding vs. unfolding 

Instantiating the folklore notions of unfolding and folding for datatypes basically 
means to replace a type name by its definition and vice versa. Extra provisions 
are needed for parameterised datatypes. The prime usage scenarios for the two 
operators are the following: 

• extraction = introduction of a type followed by its folding. 

• inlining = unfolding a type followed by its elimination. 

To give an example, the introductory example basically extracts the structure of 
imperative program blocks. To actually reconstruct this example, we need a few 
more operators. So we postpone scripting the example (see Sec. 

The operators for folding and unfolding are declared in Fig. ^ The operators make 
a strict assumption: the type which is subject to folding or unfolding is necessar- 
ily a type alias as opposed to a proper datatype. This assumption simplifies the 
treatment of the operators considerably since type aliases and their definitions are 
equivalent by definition. Extra operators for so-called wrapping and unwrapping 
allow us to use proper datatypes during folding and unfolding as well. This will be 
addressed below. In the type of the foldAlias operator, we do not just provide a type 
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type ConRange 


= {ConPos 


, Int) 


— Refer to consecutive components 


groupConRange 


: ConRange Trafo 


— Group constructor components 


ungroupConPos 


: ConPos 


Trafo 


— Inline product 


alias2newtype 


: Typeld - 


Could Trafo 


— Turn type alias into newtype 


newtype2data 


: Typeld - 


-> Trafo 


— Turn newtype into datatype 


data2newtype 


: Typeld - 


-> Trafo 


— Turn datatype into newtype 


newtype2alias 


: Typeld - 


-> Trafo 


— Turn newtype into type alias 



Fig. 10. Operators for wrapping and unwrapping 



0. Original syntax 

data Prog = Prog ProgName [Dec] [Stat] 

data Dec = VDec Id Type 

data Stat = Assign Id Expr \ If Expr Stat Stat 

data Expr = Var Id \ Const Int 

1. After grouping [Dec] and [Sfai] 

data Prog = Prog ProgName {[Dec], [Stat]) 

2. After introduction of Block to prepare folding 

data Prog = Prog ProgName {[Dec], [Stat]) 
type Block = {[Dec], [Stat]) 

3. After folding away the type expression {[Dec], [Stat]) 
data Prog = Prog ProgName Block 
type Block = {[Dec], [Stat]) 

4. After turning Block into a proper datatype with the constructor Block 
data Prog = Prog ProgName Block 

data Block = Block {[Dec], [Stat]) 

5. After ungrouping the product {[Dec], [Stat]) 
data Prog = Prog ProgName Block 
data Block = Block [Dec] [Stat] 



Fig. 11. Illustration of wrapping, unwrapping, and extraction 

name but also a list of type variables (cf. helper type TypeHdr). This is needed for 
parameterised datatypes, where we want to specify how the free type variables in 
the selected type expression map to the argument positions of the type alias. 

The preconditions for the operators are as follows. In the case of foldAlias, we need 
to check if the referenced type expression and the right-hand side of the given alias 
declaration coincide. In the case of unfolding, we need to check that the referenced 
type expression corresponds to an application of a type alias. 



3.4 Wrapping vs. unwrapping 

We will now consider operators that facilitate certain forms of wrapping and un- 



wrapping of datatype constructors (see Fig. |10|). There are operators for grouping 
and ungrouping, that is, to turn consecutive constructor components into a single 
component that is of a product type, and vice versa. There are also operators to 
mediate between the different kinds of type declarations, namely type aliases, new- 
types and datatypes. This will allow us to toggle the representation of datatypes 
in basic ways. As a result, the normal forms assumed by other operators can be 
established; recall, for example, the use of type aliases in folding and unfolding. 
This separation of concerns serves orthogonality. 
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group C'OnJridnge {^{^H si dent "Prog" , 2], 2] 


seQ 1 Tajo 


introTypes [HsTypeDecl noLoc "Block" [] 




yiis ly 1 upie [ 




HsTyApp {HsTyCon {UnQual (Hsident "List"))) 




[Ms lyCon [UnQuat [Msldent Dec ))), 




HsTyApp (HsTyCon (UnQual {Hsident "List"))) 




(HsTyCon (UnQual {Hsident "Stat")))])] 


^seqTrafo^ 


foldAlias {ConRef {Hsident "Prog", 2) SelStop) {{Hsident "Block"), []) 


^seqTrafo^ 


alias2neu)type {Hsident "Block") {Hsident "Block") 


'seqTrafo' 


newtype2data {Hsident "Block") 


'seqTrafo' 


ungroupConPos {{Hsident "Block"),!) 





Fig. 12. Script for the scenario in Fig. 1 1 



data Maybe a = Nothing | Just a 



data Maybe' a = Nothing' | Just' a 



data Maybe' a = Nothing' \ Just' a 



(Maybe' a) 



data ConsList a 



Nil 



Cons a {ConsList a) 



Fig. 13. Illustration of the generalisation of Maybe to ConsList 

In Fig. [11], we show the steps that implement the introductory example. As one 
can see, we basically implement extraction, but extra steps deal with grouping and 
ungrouping the two components subject to extraction. Also, the extracted type 
should be a proper datatype as opposed to a type alias (see transition from 3. to 4.). 
For completeness' sake, the transformation script is shown in Fig. |T^. The script 
precisely captures the steps that underly the interactive transformation in Fig. |T]. 

Some of the operators are not completely structure-preserving, that is, strictly speak- 
ing, the structures of the datatypes before and after transformation are not fully 
equivalent. For example, a newtype and a datatype are semantically distinguished, 
even if the defining constructor declaration is the very same. (This is because a con- 
structor of a datatype involves an extra lifting step in the semantical domain, i.e., 
there is an extra 'bottom' element.) The operators for grouping and ungrouping 
also deviate from full structure preservation. 



3.5 Swapping types on use sites 

We will now deal with transformations that eliminate or establish type distinctions 
by what we call swapping types on use sites. In Fig. [O], we illustrate a typical ap- 
plication of swapping. In the example, we want to generalise the standard datatype 
Maybe to allow for lists instead. In fact, we do not want to change the general 
definition of the library datatype Maybe, but we only want to change it on one use 
site (not shown in the figure). This is where swapping helps: as an intermediate 
step, we can replace Maybe on the use site by a newly introduced datatype Maybe' 
with equivalent structure. The figure illustrates how subsequent adaptations derive 
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type DataNames 


= (TypeId,[ConId]) 


type DataUnifier 


= (DataNames, DataNames) 


swapAlias 


:: TypeSel Typeld Typeld Trafo 


swapData 


:: TypeSel [DataUnifier] — > Trafo 


Fig. 14. Operators for swapping types on use sites 


type ConDecl = 


(Conid, [HsType]) — Constructor declaration 


data HsType = 


— Syntactical sort for type expressions 


includeConDecl 


: Typeld ConDecl Trafo 


exclude ConDecl 


: ConId Trafo 



Fig. 15. Operators for inclusion and exclusion of constructor declarations 



Syntax as of Fig. [T 


I 


data Prog 


= Prog ProgName Block 


data Block 


= Block [Dec] [Stat] 


data Dec 


= VDec Id Type 


data Stat 


= Assign Id Expr | // Expr Stat Stat 


data Expr 


= Var Id Const Int 


After syntax extension by statement blocks 


data Stat 


= Assign Id Expr I If Exvr Stat Stat I SBlock Block 



Fig. 16. Illustration of constructor inclusion 

the ConsList datatype from the clone of the Maybe datatype. In particular, we add 
the boxed constructor component. 

The swapping operators are declared in Fig. [T^. There is one operator for type 



aliases and another for datatype declarations. In the case of proper datatypes, one 
needs to match the constructors in addition to just the names of the types. This is 
modelled by the helper datatype DataUnifier. The type of the operator swapData 
clarifies that we are prepared to process a list of DataUnifiers. This is necessary if 
we want to swap mutually recursive systems of datatypes. 



3.6 Inclusion vs. exclusion 

We now leave the ground of structure-preserving transformations. That is, we 
will consider transformations where input and output datatypes are not structurally 
equivalent. In fact, we consider certain ways to extend or reduce the structure of the 
datatype. The first couple of structure-extending and -reducing transformations is 
about inclusion and exclusion of constructor declarations (see Fig. |T5|). These op- 
erators are only feasible for proper datatypes and not for type aliases or newtypes. 
(This is because a type alias involves no constructor at all, and a newtype is defined 
in terms of precisely one constructor declaration.) 

In Fig. [T^, we show an example for constructor inclusion. In fact, we just con- 
tinue the introductory example to make use of the extracted block structure in a 
language extension for statement blocks. That is, we include a constructor applica- 
tion for Stat to capture Block as another statement form. This continuation of the 

11 



KORT & LAMMEL 



insertConComp :: ConPos HsType — > Trafo 
deleteConComp :: ConPos Trafo 



Fig. 17. Operators for insertion and deletion of constructor components 



A datatype for a transition relation / function, and lielpers 
type TransRel a = a ^ Maybe a 
data Maybe a = Nothing \ Just a 

data ConsList a = Nil \ Cons a (ConsList a) 

Introduction of a substitute for Maybe 

data Maybe' a = Nothing' \ Just' a 

Swapping Maybe and Maybe ' in TransRel 

type TransRel a = a — > Maybe' a 

Extension of Maybe' to fit with sliape of ConsList 

data Maybe' a = Nothing' \ Just' a {Maybe' a) 

Swapping Maybe' and ConsList in TransRel 

type TransRel a = a —> ConsList a 



Fig. 18. Illustration of component insertion and type swapping 

introductory example amplifies the intended use of our operator suite: for program 
evolution in the sense of datatype refactoring and adaptation. 

3. 7 Insertion vs. deletion 

Inclusion and exclusion of constructor declarations is about the branching structure 
of datatypes. We will now discuss operators that serve for the insertion or deletion 
of constructor components (see Fig. [17|). Insertion of a component c into a con- 
structor declaration C Ci ■ ■ ■ Cn proceeds as follows. Given the target position for 
the new component, be it i ^ n + 1, the new constructor declaration is simply 
of the form C Ci ■ ■ ■ Ci^i c Ci ■ ■ ■ Cn- In general, c might need to refer to type 
parameters of the affected datatype. Deletion of a constructor declaration relies on 
the identification of the obsolete component. 

In Fig. |T8|, we elaborate on the earlier example for generalising 'maybies' to lists 
(recall Fig. [T3[). At the top of Fig. |rS|, we see three datatypes TransRel, Maybe, 
and ConsList. The idea is indeed to replace Maybe by ConsList in the using oc- 
currence in TransRel. (That is, we want to allow for a function from a to a list of 
as instead of a partial function from a to a.) We call this adaptation a generalisation 
because a list is more general than an optional. In the initial phase of the gener- 
alisation of Maybe, we disconnect the relevant occurrence of Maybe in TransRel 
from other possible occurrences in the program. So we introduce a copy Maybe' of 
Maybe, and we perform type swapping so that TransRel refers to Maybe' instead 
of the 'read-only' Maybe. Now we need to make Maybe' structurally equivalent to 
ConsList. This amounts to adding a recursive component to the second construc- 
tor Just'. Then, we can again swap types to refer to ConsList in the co-domain of 
TransRel. 
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4 Datatype transformation meets program transformation 

We will now re-iterate over the groups of operators to investigate their impact on 
functional programs. It would be utterly complex to formalise the link between 
datatype and program transformation. The mere specification of the transforma- 
tions is already intractable for a publication because of its size, and the number of 
details. So we will describe the implied program transformations informally while 
omitting less interesting details. 

4.1 Renaming 

Type names only occur inside type declarations and type annotations. So there is 
no need to adapt expressions or function declarations except for their signatures, or 
the type annotations of expressions. Constructor names can very well occur inside 
patterns and expressions that contribute to function declarations. Renaming these 
occurrences is completely straightforward. 

4.2 Permutation 

The permutation of type parameters does not necessitate any completion at the 
level of function declarations. The permutation of constructor components, how- 
ever, needs to be realized in patterns and expressions as well. This is particularly 
simple for pattern-match cases because all components are matched by definition. 
Hence, we can directly permute the sub-patterns in an affected constructor pattern. 
Witnessing permutations of constructor components in expression forms is slightly 
complicated by currying and higher-order style. Instead of permuting components 
in possibly incomplete constructor applications, we could first get access to all 
components by 'A-pumping': given a constructor C with say n potential compo- 
nents according to its declaration, we first replace C by \xi ■ ■ ■ Xn- C xi ■ ■ ■ Xn 
as justified by ?7-conversion. Then, we witness the permutation by permuting the 
arguments Xi, Xn in the pumped-up expression. In the presence of a non- 
strict language with an evaluation order on patterns, the permutation of constructor 
components might actually change the behaviour of the program regarding termi- 
nation. We neglect this problem. We should also mention that it is debatable if 
the described kind of jy-conversion is really what the programmer wants because it 
obscures the code. 

4.3 Introduction vs. elimination 

Introduction does not place any obligations on the functions defined in the same 
program. In the case of elimination, we have to ensure that the relevant types are not 
used by any function. If we assume that all function declarations are annotated by 
programmer- supplied or inferred signatures, then the precondition for elimination 
can be checked by looking at these signatures. There is an alternative approach that 
does not rely on complete type annotations: we check that no constructor of the 
relevant types is used. 
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4.4 Folding vs. unfolding 

The restriction of folding and unfolding to type aliases guarantees that these oper- 
ators do not necessitate any adaptation of the function declarations. This is simply 
because interchanging a type alias and its definition is completely structure- and 
semantics-preserving, by definition. This is extremely convenient: despite the cru- 
cial role of the operators for folding and unfolding, they do not raise any issue at 
the level of function declarations. 

4.5 Wrapping vs. unwrapping 

Grouping and ungrouping These operators are handled using the same overall ap- 
proach as advocated for the permutation of constructor components. That is, in 
patterns we witness grouping or ungrouping by inserting or removing the enclosing 
"( ... )"; in expressions, we perform r^-conversion to access the relevant compo- 
nents, and then we group or ungroup them in the pumped-up constructor applica- 
tion. 

Mediation between newtypes and datatypes These datatype transformations do not 
imply any adaptations of the functions that involve the datatype in question. (As 
we indicated earlier, the extra bottom value of a datatype, when compared to a 
newtype, allows a program to be 'undefined' in one more way.) 

Newtype to alias migration We simply remove all occurrences of the associated 
constructor both in pattern and expression forms. We require that the relevant new- 
type is not covered by any instance declaration of some type class or constructor 
class. Otherwise, we had to inline these members in a non-obvious way prior to the 
removal of the constructor. If we neglected this issue, the resulting program either 
becomes untypeable, or a different instance is applied accidentally, which would 
be hazardous regarding semantics preservation. 

Alias to newtype migration This operator requires a non-trivial treatment for func- 
tion declarations. The crucial issue is how to know the following: 

• What expressions have to be wrapped with the newtype constructor? 

• In what patterns does the newtype constructor need to be stripped? 

Our approach is as simple as possible. We observe that the new newtype might be 
used in the declarations of other datatypes. The corresponding patterns and expres- 
sions can be easily located and adapted as in the case of permutation, grouping, and 
ungrouping (recall 77-conversion etc.). We also need to adapt function declarations 
if their argument or result types are known to refer to the relevant alias. This ba- 
sically means that we need to access the affected arguments and result expressions 
in all relevant equations to unwrap the arguments and wrap the result expressions. 
These adaptations are slightly complicated by the fact that the affected type alias 
can occur in arbitrarily nested locations. 

In Fig. |19], we illustrate the effect of the aliaslnewtype operator in the introductory 
example. We show the top-level interpreter function that maps over the statements 
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Top-level interpreter function before the illustrative extraction 
run :: Prog — > State () 

run {Prog name decs stats) = mapM interpret stats 
The same function after extraction 
run :: Prog — > State () 

run {Prog name {Block decs stats) ) = mapM interpret stats 

Fig. 19. Function adaptation triggered by alias-to-newtype migration 



Input program 

type TransRel a = a ^ Maybe a 
data Maybe' a = Nothing \ Just a 
deadEnd :: TransRel a ^ a ^ Bool 
deadEnd r a = case r a of Nothing — » True 
Just _ — > False 

Output program 

type TransRel a = a ^ Maybe' a 
data Maybe' a = Nothing \ Just a 
deadEnd :: TransRel a ^ a ^ Bool 

deadEnd r a = case toMaybe {r a) of Nothing —> True 

Just _ — > False 

Induced helper for type swapping 

toMaybe :: Maybe' a — > Maybe a 
toMaybe Nothing' = Nothing 
toMaybe {Just' a) = Just a 



Fig. 20. Function adaptation triggered by type swapping 

of the program. (The program name and the declarations do not carry any semantics 
here.) The type of the function run exhibits that the meaning of a program is a 
computation that involves a State for the program variables. The adapted version 
of run refers to the extra constructor Block, which resulted from extraction. 



4.6 Swapping types on use sites 

This operator relies on the same techniques as aliaslnewtype. However, instead of 
wrapping and unwrapping a constructor. We invoke conversion functions that me- 
diate between the two structurally equivalent types. These mediators merely map 
old to new constructors and vice versa, and hence they are immediately induced by 
the datatype transformation itself, namely by the DataUnifiers, passed to the swap 
operator. This approach implies that we only perform very local changes. The 
program code will still work on the old datatypes thanks to the mediators. 



The impact of swapping types at the function level is illustrated in Fig. |2^. We 
deal with the initial steps of the Maybe-to- ConsList migration in Fig. [r|, where 
we replace the occurrence of Maybe within TransRel by a structurally equivalent 
Maybe'. We show an illustrative function deadEnd which performs a test if the 
given transition relation allows for a transition in the presence of a given state a. 
The adapted function deadEnd refers to the conversion function toMaybe prior to 
performing pattern matching on the obsolete Maybe type. 
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Input program 

data Stat = Assign Id Expr \ If Expr Stat Stat 
interpret :: Stat — > State () 

interpret (Assign i e) = envLookup i 3= Ar — + ... 
interpret {If e si s2) = reval e 3>= Xv — » ... 

Output program 

data Stat = Assign Id Expr \ If Expr Stat Stat \ SBlock Block 
interpret :: Stat — > State () 

interpret (Assign i e) = envLookup i 3= Ar ... 
interpret {If e si s2) = reval e 3= Xv — » ... 
interpret ( SBlock _ ) = ± 



Fig. 21. Inclusion of a constructor declaration 
4. 7 Inclusion vs. exclusion 

Intuitively, the inclusion of a constructor should be complemented by the extension 
of all relevant case discriminations. This normally means to add a pattern-match 
equation (or a case to a case expression) for the new constructor. Dually, exclusion 
of a constructor should be complemented by the removal of all pattern-match equa- 
tions (or cases) that refer to this constructor. In the case of added pattern-match 
equations, we view the right-hand sides of these equations as a kind of 'hot spot' 
to be resolved by subsequent expression-level transformations. To this end, we use 
"undefined", i.e., as a kind of to-do marker. Dually, in the case of removed 
constructors, we also need to replace occurrences of the constructor within expres- 
sions by When using interactive tool support, these to-do markers are useful 
to control further steps in a transformation scenario. 

In Fig. |T], we progress with our running example of an interpreter for an impera- 
tive language. We illustrate the step where blocks are turned into another form of 
statements. Hence, the shown output program involves a new pattern-match equa- 
tion that interprets statement blocks. This added equation reflects that the meaning 
of such blocks is as yet undefined, subject to subsequent adaptations. 



4.8 Insertion vs. deletion 

Inserting a component into a declaration for a constructor C means that all patterns 
with C as outermost constructor must be adapted to neglect the added component, 
and all applications of C must be completed to include "_L" for the added compo- 
nent. Dually, deletion of a component from C means that all applications of C and 
all patterns with C as outermost constructor need to be cleaned up to project away 
the obsolete component. Any reference to a pattern variable for the obsolete com- 
ponent is replaced by As in the case of permutation and others, //-conversion 
is needed to actually get access to constructor components in expressions. 

In Fig. P2l the insertion of a constructor component is illustrated by continuing 



the scenario from Fig. The adapted equation of toMaybe involves an extended 
pattern. As the don't care pattern "_" indicates, the definition of toMaybe does not 
make use of the added component. In fact, the definition of the function deadEnd 
does not need to be adapted; it only tests for the availability of a transition step. 
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Output program 

type TransRel a = a ^ Maybe' a 

data Maybe' a = Nothing' \ Just' a (Maybe' a) 

deadEnd :: TransRel a ^ a ^ Bool 

deadEnd r a = case toMaybe (r a) of Nothing — > True 

Just _ — > False 

Induced helper for type swapping 

toMaybe :: Maybe' a — > Maybe a 
toMaybe Nothing' = Nothing 
toMaybe { Just' a _ ) = Just a 



Fig. 22. Illustration of the insertion of a constructor component 
Normally, other functions will start to rely on the richer pattern. 



5 Related work 



Transformational program development Formal program transformation [ |BD77| ] 
separates two concerns: the development of an initial, maybe inefficient program 
the correctness of which can easily be shown, and the stepwise derivation of a bet- 
ter implementation in a semantics-preserving manner. Partsch's textbook [ Par90| ] 
describes the formal approach to this kind of software development. Pettorossi 
and Proietti study typical transformation rules (for functional and logic) programs 
in [ PP960 . Formal program transformation, in part, also addresses datatype trans- 
formation [ |dRE9?j| ], say data refinement. Here, one gives different axiomatisations 
or implementations of an abstract datatype which are then related by well-founded 
transformation steps. This typically involves some amount of mathematical pro- 
gram calculation. By contrast, we deliberately focus on the more syntactical trans- 
formations that a programmer uses anyway to adapt evolving programs. 

Database schema evolution There is a large body of research addressing the re- 
lated problem of database schema evolution 0BKKK87| ] as relevant, for example, in 
database re- and reverse engineering [ |HTJC93D . The schema transformations them- 
selves can be compared with our datatype transformations only at a superficial level 
because of the different formalisms involved. There exist formal frameworks for 
the definition of schema transformations and various formalisms have been inves- 



tigated [ |MP97| ] . An interesting aspect of database schema evolution is that schema 
evolution necessitates a database instance mapping [1BCN92| 1 . Compare this with 



the evolution of the datatypes in a functional program. Here, the main concern is to 
update the function declarations for compliance with the new datatypes. It seems 
that the instance mapping problem is a special case of the program update problem. 

Refactoring The transformational approach to program evolution is nowadays called 
refactoring [lQpd92| , Fow99|l , but the idea is not new [|ABFP86|jGN90|| . Refactor- 
ing means to improve the structure of code so that it becomes more comprehen- 
sible, maintainable, and adaptable. Interactive refactoring tools are being studied 
and used extensively in the object-oriented programming context [ |Moo96| , ^BJ97| ]. 
Typical examples oi functional program refactorings are described in [ |LamOO| ] , e.g.. 



the introduction of a monad in a non-monadic program. The precise inhabitation of 
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the refactoring notion for functional programming is being addressed in a project 
at the University of Kent by Thompson and Reinke; see [ fTROlQ . There is also 
related work on type-safe meta-programming in a functional context, e.g., by Er- 
wig 0ERO2I ] . Previous work did not specifically address datatype transformations. 
The refactorings for object-oriented class structures are not directly applicable be- 
cause of the different structure and semantics of classes vs. algebraic datatypes. 

Structure editing Support for interactive transformations can be seen as a sophisti- 
cation of structure editing [ ]RT88| , pbo94| , [KS9E| ]. This link between transformation 
and editing is particularly appealing for our "syntactical" transformations. Not sur- 
prisingly, concepts that were developed for structure editing are related to our work. 
For example, in [ ]SdM99| ], primitives of structure editing are identified based on the 
notion of focus to select subtrees, and on navigation primitives left, right, up and 
down. Trees, subtrees and paths are here defined as follows: 

data Tree — Fork Label [Tree] 

type SubTree = {Path, Tree) 

type Path = [Layer] 

type Layer = [Label, [ Tree], [ Tree]) 

The t in a subtree {p, t) is the currently selected tree and it is between the left and 
right trees in the top layer (the head of the p). This approach does not account for 
the heterogeneous character of language syntaxes, but it shows that the fact if a 
focus resides in a term can be encoded in types. 



6 Concluding remarks 

Contribution We identified the fundamental primitives for datatype transformation. 
These operators are meant to support common scenarios of program adaptation in 
functional programming, or other settings where algebraic datatypes play a role. 
In fact, all the identified operators are universal in the sense, that they are also 
meaningful for other program abstractions than just datatypes, e.g., function decla- 
rations. We deliberately focused on adaptations of datatypes because a vast body of 
previous work addressed fold/unfold transformations for recursive functions. De- 
spite the focus on datatype transformations, we had to consider program trans- 
formations that are necessitated by the modification of datatypes. Regarding the 
executable specification of the operator suite, we adhered to the formula: meta- 
programs = object-programs = Haskell programs. We employed generic functional 
programming in the interest of conciseness. We also employed designated means 
of referring to fragments of interest, e.g., a focus concept. 

Partial project failure We are confident that the identified operators are sufficient 
and appropriate for actual datatype transformations. We have attempted to comple- 
ment this framework development by actual interactive tool support. We initially 
thought that using Haskell for this interactive tooling as well would be a good idea. 
Since the actual transformation operators are implemented in Haskell anyway, and 
the interactive dialogues need to cooperate with the operator framework to perform 
analyses, Haskell indeed seems to be the obvious choice. To make a long story 
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short, there are many GUI libraries for Haskell, but none of them is suitable for 
developing a sophisticated GUI for interactive program transformation at the mo- 
ment. It seems that environments for interactive language tools would provide a 
better starting point, e.g., environments based on attribute grammars [ |RT88| , pS9^ ]. 

Perspective To cover full Haskell, a few further operators would have to be added 
to our suite, in particular, operators that support type and constructor classes. We 
should also pay full attention to some idiosyncrasies of Haskell; cf. refutable vs. 
irrefutable patterns. Then, there are also transformation techniques that seem to 
go beyond our notion of program evolution but it is interesting to cover them any- 
way. We think of techniques like turning a system of datatypes into functorial style, 
or threading a parameter through a system of datatypes. The ultimate perspective 
for the presented work is to integrate the datatype transformations into a complete, 
well-founded, and user-friendly refactoring tool for functional programming along 
the lines of Thompson's and Reinke's research project [ fTROll ]. Another perspec- 
tive for our research is to further pursue the intertwined character of datatype and 
program transformations in the context of XML format and API evolution. 
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